8 research outputs found

    14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

    Get PDF
    Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines

    14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

    Full text link
    Chemistry and materials science are complex. Recently, there have been great successes in addressing this complexity using data-driven or computational techniques. Yet, the necessity of input structured in very specific forms and the fact that there is an ever-growing number of tools creates usability and accessibility challenges. Coupled with the reality that much data in these disciplines is unstructured, the effectiveness of these tools is limited. Motivated by recent works that indicated that large language models (LLMs) might help address some of these issues, we organized a hackathon event on the applications of LLMs in chemistry, materials science, and beyond. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines

    Dari Dataset for Coreference Resolution

    No full text
    DariCoref, a Dari corpus annotated for anaphoric relations, where all documents are collected from Dari VOA and Azadi Radio. The annotation scheme follows the OntoNotes and WikiCoref. Each markable annotated with coreference type (Identical, Attributive, and Copular), and mention type (Named Entity, Noun Phrase, and Pronominal). Since this is the first annotation efforts concentrate on very specific types of written text, mainly newswire, there is a lack of resources for Dari texts. Therefore, we present a freely available resource we devised for the task of coreference resolution algorithms dedicated to Dari texts. The annotation has been processed by MMAX2 tool

    Dari Dataset for Part-of-Speech

    No full text
    File is encoded as UTF-8 with arabic characters.This dataset is related to the task of part-of-speech tagging on the Dari language. It will be usable for many tasks of Natural Language processing on Dari text. The size of the dataset is 12K and it is annotated manually. The tagset used in this dataset is the Universal Tagger

    Dari Language Stopword Lists

    No full text
    The following is a list of stop words that are collected from books and newspapers that all follow Dari pure orthographic structure. These are frequently used in the Dari language but do not carry meaningful information for some language modeling tasks. This list of words reduces the noise in textual data and is excluded from the analysis. We always welcome, if you have any idea to change or supplement the list

    Dari Dataset for Named Entity Recognition DariNER1

    No full text
    File is encoded as UTF-8 with arabic characters.DariNER1 is the collection of the data from Dari newswire domains. This dataset is developed based on the IO encoding scheme which following four types of named entities such as Person, Location, Organization, and Miscellaneous. The data follow the Dari pure orthographic structure and collected from Dari VOA news, Azadi Radio and Kankor (University National Entry Exam) from Higher Education of Afghanistan

    Rare predicted loss-of-function variants of type I IFN immunity genes are associated with life-threatening COVID-19

    No full text
    BackgroundWe previously reported that impaired type I IFN activity, due to inborn errors of TLR3- and TLR7-dependent type I interferon (IFN) immunity or to autoantibodies against type I IFN, account for 15-20% of cases of life-threatening COVID-19 in unvaccinated patients. Therefore, the determinants of life-threatening COVID-19 remain to be identified in similar to 80% of cases.MethodsWe report here a genome-wide rare variant burden association analysis in 3269 unvaccinated patients with life-threatening COVID-19, and 1373 unvaccinated SARS-CoV-2-infected individuals without pneumonia. Among the 928 patients tested for autoantibodies against type I IFN, a quarter (234) were positive and were excluded.ResultsNo gene reached genome-wide significance. Under a recessive model, the most significant gene with at-risk variants was TLR7, with an OR of 27.68 (95%CI 1.5-528.7, P=1.1x10(-4)) for biochemically loss-of-function (bLOF) variants. We replicated the enrichment in rare predicted LOF (pLOF) variants at 13 influenza susceptibility loci involved in TLR3-dependent type I IFN immunity (OR=3.70[95%CI 1.3-8.2], P=2.1x10(-4)). This enrichment was further strengthened by (1) adding the recently reported TYK2 and TLR7 COVID-19 loci, particularly under a recessive model (OR=19.65[95%CI 2.1-2635.4], P=3.4x10(-3)), and (2) considering as pLOF branchpoint variants with potentially strong impacts on splicing among the 15 loci (OR=4.40[9%CI 2.3-8.4], P=7.7x10(-8)). Finally, the patients with pLOF/bLOF variants at these 15 loci were significantly younger (mean age [SD]=43.3 [20.3] years) than the other patients (56.0 [17.3] years; P=1.68x10(-5)).ConclusionsRare variants of TLR3- and TLR7-dependent type I IFN immunity genes can underlie life-threatening COVID-19, particularly with recessive inheritance, in patients under 60 years old
    corecore